2025-07-02T08:02:26.4661599Z Current runner version: '2.325.0' 2025-07-02T08:02:26.4667918Z Runner name: 'i-0e1b05daa0106c173' 2025-07-02T08:02:26.4668649Z Runner group name: 'default' 2025-07-02T08:02:26.4669467Z Machine name: 'ip-10-0-65-166' 2025-07-02T08:02:26.4672081Z ##[group]GITHUB_TOKEN Permissions 2025-07-02T08:02:26.4674382Z Contents: read 2025-07-02T08:02:26.4674898Z Metadata: read 2025-07-02T08:02:26.4675399Z ##[endgroup] 2025-07-02T08:02:26.4677303Z Secret source: Actions 2025-07-02T08:02:26.4677930Z Prepare workflow directory 2025-07-02T08:02:26.5224887Z Prepare all required actions 2025-07-02T08:02:26.5262480Z Getting action download info 2025-07-02T08:02:26.8091313Z Download action repository 'actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683' (SHA:11bd71901bbe5b1630ceea73d27597364c9af683) 2025-07-02T08:02:27.1157809Z Download action repository 'pytorch/pytorch@main' (SHA:0364db7cd14ffa67b48ef8c27fefbb3eed2b065d) 2025-07-02T08:02:42.2792523Z Download action repository 'actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-07-02T08:02:42.6299976Z Download action repository 'pmeier/pytest-results-action@a2c1430e2bddadbad9f49a6f9b879f062c6b19b1' (SHA:a2c1430e2bddadbad9f49a6f9b879f062c6b19b1) 2025-07-02T08:02:42.7866280Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-07-02T08:02:43.1894880Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-07-02T08:02:43.4431377Z Getting action download info 2025-07-02T08:02:43.5860303Z Uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@refs/heads/main (4e43bd8700fb3fac32b6155020e13e6033eb4bcb) 2025-07-02T08:02:43.5864175Z ##[group] Inputs 2025-07-02T08:02:43.5866394Z script: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:43.5869039Z timeout: 120 2025-07-02T08:02:43.5869273Z runner: linux.g5.4xlarge.nvidia.gpu 2025-07-02T08:02:43.5869561Z upload-artifact: 2025-07-02T08:02:43.5870357Z upload-artifact-to-s3: false 2025-07-02T08:02:43.5870641Z download-artifact: 2025-07-02T08:02:43.5870867Z repository: pytorch/rl 2025-07-02T08:02:43.5871124Z fetch-depth: 1 2025-07-02T08:02:43.5871332Z submodules: 2025-07-02T08:02:43.5871529Z ref: 2025-07-02T08:02:43.5871744Z test-infra-repository: pytorch/test-infra 2025-07-02T08:02:43.5872058Z test-infra-ref: 2025-07-02T08:02:43.5872299Z use-custom-docker-registry: true 2025-07-02T08:02:43.5872588Z docker-image: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:43.5872888Z docker-build-dir: .ci/docker 2025-07-02T08:02:43.5873139Z gpu-arch-type: cpu 2025-07-02T08:02:43.5873358Z gpu-arch-version: 2025-07-02T08:02:43.5873570Z job-name: linux-job 2025-07-02T08:02:43.5873803Z continue-on-error: false 2025-07-02T08:02:43.5874038Z binary-matrix: 2025-07-02T08:02:43.5874258Z run-with-docker: true 2025-07-02T08:02:43.5874473Z secrets-env: 2025-07-02T08:02:43.5874690Z no-sudo: false 2025-07-02T08:02:43.5874930Z ##[endgroup] 2025-07-02T08:02:43.5875172Z Complete job name: unittests (3.9, 12.8) / linux-job 2025-07-02T08:02:43.6484040Z A job started hook has been configured by the self-hosted runner administrator 2025-07-02T08:02:43.6613694Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-07-02T08:02:43.6625465Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:43.6626346Z ##[endgroup] 2025-07-02T08:02:45.0005936Z Runner Type: linux.g5.4xlarge.nvidia.gpu 2025-07-02T08:02:45.0006364Z Instance Type: g5.4xlarge 2025-07-02T08:02:45.0006640Z AMI Name: unknown 2025-07-02T08:02:45.0050764Z AMI ID: ami-05ffe3c48a9991133 2025-07-02T08:02:50.5375423Z ##[group]Run set -euxo pipefail 2025-07-02T08:02:50.5375783Z set -euxo pipefail 2025-07-02T08:02:50.5376070Z if [[ "${NO_SUDO}" == "false" ]]; then 2025-07-02T08:02:50.5376424Z  echo "::group::Cleanup with-sudo debug output" 2025-07-02T08:02:50.5376783Z  sudo rm -rfv "${GITHUB_WORKSPACE}" 2025-07-02T08:02:50.5377065Z else 2025-07-02T08:02:50.5377312Z  echo "::group::Cleanup no-sudo debug output" 2025-07-02T08:02:50.5377650Z  rm -rfv "${GITHUB_WORKSPACE}" 2025-07-02T08:02:50.5377923Z fi 2025-07-02T08:02:50.5378120Z  2025-07-02T08:02:50.5378337Z mkdir -p "${GITHUB_WORKSPACE}" 2025-07-02T08:02:50.5378648Z echo "::endgroup::" 2025-07-02T08:02:50.5393235Z shell: /usr/bin/bash -e {0} 2025-07-02T08:02:50.5393483Z env: 2025-07-02T08:02:50.5393686Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:50.5393979Z REPOSITORY: pytorch/rl 2025-07-02T08:02:50.5394256Z PR_NUMBER: 3030 2025-07-02T08:02:50.5396400Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:50.5398595Z NO_SUDO: false 2025-07-02T08:02:50.5398795Z ##[endgroup] 2025-07-02T08:02:50.5434566Z + [[ false == \f\a\l\s\e ]] 2025-07-02T08:02:50.5447850Z ##[group]Cleanup with-sudo debug output 2025-07-02T08:02:50.5450713Z + echo '::group::Cleanup with-sudo debug output' 2025-07-02T08:02:50.5451141Z + sudo rm -rfv /home/ec2-user/actions-runner/_work/rl/rl 2025-07-02T08:02:50.7007958Z removed directory '/home/ec2-user/actions-runner/_work/rl/rl' 2025-07-02T08:02:50.7032308Z + mkdir -p /home/ec2-user/actions-runner/_work/rl/rl 2025-07-02T08:02:50.7048870Z + echo ::endgroup:: 2025-07-02T08:02:50.7049530Z ##[endgroup] 2025-07-02T08:02:50.7167103Z ##[group]Run actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 2025-07-02T08:02:50.7167506Z with: 2025-07-02T08:02:50.7167723Z repository: pytorch/test-infra 2025-07-02T08:02:50.7167992Z path: test-infra 2025-07-02T08:02:50.7168202Z submodules: recursive 2025-07-02T08:02:50.7168676Z token: *** 2025-07-02T08:02:50.7168892Z ssh-strict: true 2025-07-02T08:02:50.7169097Z ssh-user: git 2025-07-02T08:02:50.7169323Z persist-credentials: true 2025-07-02T08:02:50.7169588Z clean: true 2025-07-02T08:02:50.7169818Z sparse-checkout-cone-mode: true 2025-07-02T08:02:50.7170095Z fetch-depth: 1 2025-07-02T08:02:50.7170296Z fetch-tags: false 2025-07-02T08:02:50.7170513Z show-progress: true 2025-07-02T08:02:50.7170728Z lfs: false 2025-07-02T08:02:50.7170940Z set-safe-directory: true 2025-07-02T08:02:50.7171211Z env: 2025-07-02T08:02:50.7171422Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:50.7171708Z REPOSITORY: pytorch/rl 2025-07-02T08:02:50.7171978Z PR_NUMBER: 3030 2025-07-02T08:02:50.7174168Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:50.7176572Z ##[endgroup] 2025-07-02T08:02:50.8573768Z Syncing repository: pytorch/test-infra 2025-07-02T08:02:50.8574650Z ##[group]Getting Git version info 2025-07-02T08:02:50.8575162Z Working directory is '/home/ec2-user/actions-runner/_work/rl/rl/test-infra' 2025-07-02T08:02:50.8575922Z [command]/usr/bin/git version 2025-07-02T08:02:50.8580985Z git version 2.47.1 2025-07-02T08:02:50.8607081Z ##[endgroup] 2025-07-02T08:02:50.8630872Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/5745cc2b-4f16-4135-9b63-99d237c3cfcc' before making global git config changes 2025-07-02T08:02:50.8631924Z Adding repository directory to the temporary git global config as a safe directory 2025-07-02T08:02:50.8636171Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/rl/rl/test-infra 2025-07-02T08:02:50.8675090Z ##[group]Initializing the repository 2025-07-02T08:02:50.8679451Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/rl/rl/test-infra 2025-07-02T08:02:50.8725366Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-07-02T08:02:50.8726060Z hint: is subject to change. To configure the initial branch name to use in all 2025-07-02T08:02:50.8726653Z hint: of your new repositories, which will suppress this warning, call: 2025-07-02T08:02:50.8727200Z hint: 2025-07-02T08:02:50.8727584Z hint: git config --global init.defaultBranch 2025-07-02T08:02:50.8746893Z hint: 2025-07-02T08:02:50.8747306Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-07-02T08:02:50.8747854Z hint: 'development'. The just-created branch can be renamed via this command: 2025-07-02T08:02:50.8748260Z hint: 2025-07-02T08:02:50.8748523Z hint: git branch -m 2025-07-02T08:02:50.8749017Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/rl/rl/test-infra/.git/ 2025-07-02T08:02:50.8750207Z [command]/usr/bin/git remote add origin https://github.com/pytorch/test-infra 2025-07-02T08:02:50.8775913Z ##[endgroup] 2025-07-02T08:02:50.8776339Z ##[group]Disabling automatic garbage collection 2025-07-02T08:02:50.8780444Z [command]/usr/bin/git config --local gc.auto 0 2025-07-02T08:02:50.8815095Z ##[endgroup] 2025-07-02T08:02:50.8815490Z ##[group]Setting up auth 2025-07-02T08:02:50.8821691Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-07-02T08:02:50.9059869Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-07-02T08:02:50.9504803Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-07-02T08:02:50.9541147Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-07-02T08:02:50.9962486Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-07-02T08:02:51.0019408Z ##[endgroup] 2025-07-02T08:02:51.0019851Z ##[group]Determining the default branch 2025-07-02T08:02:51.0022601Z Retrieving the default branch name 2025-07-02T08:02:51.2277890Z Default branch 'main' 2025-07-02T08:02:51.2284229Z ##[endgroup] 2025-07-02T08:02:51.2284664Z ##[group]Fetching the repository 2025-07-02T08:02:51.2285354Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/main:refs/remotes/origin/main 2025-07-02T08:02:51.6988388Z From https://github.com/pytorch/test-infra 2025-07-02T08:02:51.6988829Z * [new branch] main -> origin/main 2025-07-02T08:02:51.7019589Z ##[endgroup] 2025-07-02T08:02:51.7019986Z ##[group]Determining the checkout info 2025-07-02T08:02:51.7020847Z ##[endgroup] 2025-07-02T08:02:51.7025284Z [command]/usr/bin/git sparse-checkout disable 2025-07-02T08:02:51.7069875Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-07-02T08:02:51.7103449Z ##[group]Checking out the ref 2025-07-02T08:02:51.7107205Z [command]/usr/bin/git checkout --progress --force -B main refs/remotes/origin/main 2025-07-02T08:02:51.8679441Z Switched to a new branch 'main' 2025-07-02T08:02:51.8683783Z branch 'main' set up to track 'origin/main'. 2025-07-02T08:02:51.8698212Z ##[endgroup] 2025-07-02T08:02:51.8698617Z ##[group]Setting up auth for fetching submodules 2025-07-02T08:02:51.8704041Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-07-02T08:02:51.8757572Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-07-02T08:02:51.8794617Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-07-02T08:02:51.8831422Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-07-02T08:02:51.8865326Z ##[endgroup] 2025-07-02T08:02:51.8865714Z ##[group]Fetching submodules 2025-07-02T08:02:51.8869264Z [command]/usr/bin/git submodule sync --recursive 2025-07-02T08:02:51.9261126Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 --recursive 2025-07-02T08:02:51.9655669Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-07-02T08:02:52.0039449Z ##[endgroup] 2025-07-02T08:02:52.0039855Z ##[group]Persisting credentials for submodules 2025-07-02T08:02:52.0044918Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-07-02T08:02:52.0429568Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-07-02T08:02:52.0816043Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-07-02T08:02:52.1194902Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-07-02T08:02:52.1581250Z ##[endgroup] 2025-07-02T08:02:52.1626444Z [command]/usr/bin/git log -1 --format=%H 2025-07-02T08:02:52.1657443Z 4e43bd8700fb3fac32b6155020e13e6033eb4bcb 2025-07-02T08:02:52.1898093Z Prepare all required actions 2025-07-02T08:02:52.1898551Z Getting action download info 2025-07-02T08:02:52.3109328Z Download action repository 'pytorch/test-infra@main' (SHA:4e43bd8700fb3fac32b6155020e13e6033eb4bcb) 2025-07-02T08:02:54.2578841Z Getting action download info 2025-07-02T08:02:54.3588079Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-07-02T08:02:54.5346501Z ##[group]Run ./test-infra/.github/actions/setup-linux 2025-07-02T08:02:54.5346824Z env: 2025-07-02T08:02:54.5347038Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:54.5347324Z REPOSITORY: pytorch/rl 2025-07-02T08:02:54.5347562Z PR_NUMBER: 3030 2025-07-02T08:02:54.5349692Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:54.5352038Z ##[endgroup] 2025-07-02T08:02:54.5435350Z ##[group]Run set -euo pipefail 2025-07-02T08:02:54.5435642Z set -euo pipefail 2025-07-02T08:02:54.5435899Z function get_ec2_metadata() { 2025-07-02T08:02:54.5436232Z  # Pulled from instance metadata endpoint for EC2 2025-07-02T08:02:54.5436809Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-07-02T08:02:54.5437705Z  category=$1 2025-07-02T08:02:54.5438527Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-07-02T08:02:54.5439424Z } 2025-07-02T08:02:54.5439654Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-07-02T08:02:54.5440037Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-07-02T08:02:54.5440462Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-07-02T08:02:54.5440847Z echo "system info $(uname -a)" 2025-07-02T08:02:54.5449851Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:54.5450190Z env: 2025-07-02T08:02:54.5450392Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:54.5450682Z REPOSITORY: pytorch/rl 2025-07-02T08:02:54.5450910Z PR_NUMBER: 3030 2025-07-02T08:02:54.5453010Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:54.5455208Z ##[endgroup] 2025-07-02T08:02:54.5619683Z ami-id: ami-05ffe3c48a9991133 2025-07-02T08:02:54.5736944Z instance-id: i-0e1b05daa0106c173 2025-07-02T08:02:54.5863094Z instance-type: g5.4xlarge 2025-07-02T08:02:54.5878895Z system info Linux ip-10-0-65-166.ec2.internal 6.1.141-155.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jun 17 10:29:47 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-07-02T08:02:54.5921026Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-07-02T08:02:54.5922079Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-07-02T08:02:54.5931337Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:54.5931682Z env: 2025-07-02T08:02:54.5931899Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:54.5932190Z REPOSITORY: pytorch/rl 2025-07-02T08:02:54.5932442Z PR_NUMBER: 3030 2025-07-02T08:02:54.5934542Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:54.5936692Z ##[endgroup] 2025-07-02T08:02:54.6055287Z ##[group]Run if systemctl is-active --quiet docker; then 2025-07-02T08:02:54.6055687Z if systemctl is-active --quiet docker; then 2025-07-02T08:02:54.6056036Z  echo "Docker daemon is running..."; 2025-07-02T08:02:54.6056321Z else 2025-07-02T08:02:54.6056828Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2025-07-02T08:02:54.6057201Z fi 2025-07-02T08:02:54.6066114Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:54.6066445Z env: 2025-07-02T08:02:54.6066651Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:54.6066939Z REPOSITORY: pytorch/rl 2025-07-02T08:02:54.6067171Z PR_NUMBER: 3030 2025-07-02T08:02:54.6069288Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:54.6071441Z ##[endgroup] 2025-07-02T08:02:54.6171791Z Docker daemon is running... 2025-07-02T08:02:54.6204398Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2025-07-02T08:02:54.6204978Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2025-07-02T08:02:54.6205444Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2025-07-02T08:02:54.6205992Z retry aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2025-07-02T08:02:54.6206669Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2025-07-02T08:02:54.6215356Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:54.6215692Z env: 2025-07-02T08:02:54.6215895Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:54.6216187Z REPOSITORY: pytorch/rl 2025-07-02T08:02:54.6216414Z PR_NUMBER: 3030 2025-07-02T08:02:54.6218529Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:54.6220858Z AWS_RETRY_MODE: standard 2025-07-02T08:02:54.6221127Z AWS_MAX_ATTEMPTS: 5 2025-07-02T08:02:54.6221356Z AWS_DEFAULT_REGION: us-east-1 2025-07-02T08:02:54.6221605Z ##[endgroup] 2025-07-02T08:02:55.6680699Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-07-02T08:02:55.6681258Z Configure a credential helper to remove this warning. See 2025-07-02T08:02:55.6682063Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-07-02T08:02:55.6682534Z 2025-07-02T08:02:55.6682659Z Login Succeeded 2025-07-02T08:02:55.6738222Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-07-02T08:02:55.6738763Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-07-02T08:02:55.6739221Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-07-02T08:02:55.6749101Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:55.6749431Z env: 2025-07-02T08:02:55.6749654Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:55.6749941Z REPOSITORY: pytorch/rl 2025-07-02T08:02:55.6750181Z PR_NUMBER: 3030 2025-07-02T08:02:55.6752281Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:55.6754648Z ##[endgroup] 2025-07-02T08:02:55.6860841Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-07-02T08:02:55.6861281Z RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-07-02T08:02:55.6861659Z sudo rm -rf "${RUNNER_ARTIFACT_DIR}" 2025-07-02T08:02:55.6861977Z mkdir -p "${RUNNER_ARTIFACT_DIR}" 2025-07-02T08:02:55.6862382Z echo "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}" 2025-07-02T08:02:55.6862757Z  2025-07-02T08:02:55.6863038Z RUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results" 2025-07-02T08:02:55.6863432Z sudo rm -rf "${RUNNER_TEST_RESULTS_DIR}" 2025-07-02T08:02:55.6863767Z mkdir -p "${RUNNER_TEST_RESULTS_DIR}" 2025-07-02T08:02:55.6864198Z echo "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}" 2025-07-02T08:02:55.6864607Z  2025-07-02T08:02:55.6864825Z RUNNER_DOCS_DIR="${RUNNER_TEMP}/docs" 2025-07-02T08:02:55.6865130Z sudo rm -rf "${RUNNER_DOCS_DIR}" 2025-07-02T08:02:55.6865430Z mkdir -p "${RUNNER_DOCS_DIR}" 2025-07-02T08:02:55.6865792Z echo "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}" 2025-07-02T08:02:55.6874794Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:55.6875123Z env: 2025-07-02T08:02:55.6875330Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:55.6875617Z REPOSITORY: pytorch/rl 2025-07-02T08:02:55.6875838Z PR_NUMBER: 3030 2025-07-02T08:02:55.6877944Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:55.6880194Z ##[endgroup] 2025-07-02T08:02:56.2879497Z ##[group]Run needs=0 2025-07-02T08:02:56.2879744Z needs=0 2025-07-02T08:02:56.2880083Z if lspci -v | grep -e 'controller.*NVIDIA' >/dev/null 2>/dev/null; then 2025-07-02T08:02:56.2880482Z  needs=1 2025-07-02T08:02:56.2880680Z fi 2025-07-02T08:02:56.2880907Z echo "does=${needs}" >> $GITHUB_OUTPUT 2025-07-02T08:02:56.2889951Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:02:56.2890305Z env: 2025-07-02T08:02:56.2890509Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:56.2890798Z REPOSITORY: pytorch/rl 2025-07-02T08:02:56.2891038Z PR_NUMBER: 3030 2025-07-02T08:02:56.2893133Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:56.2895406Z RUNNER_ARTIFACT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-07-02T08:02:56.2895943Z RUNNER_TEST_RESULTS_DIR: /home/ec2-user/actions-runner/_work/_temp/test-results 2025-07-02T08:02:56.2896596Z RUNNER_DOCS_DIR: /home/ec2-user/actions-runner/_work/_temp/docs 2025-07-02T08:02:56.2896943Z ##[endgroup] 2025-07-02T08:02:56.3255614Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-07-02T08:02:56.3255982Z with: 2025-07-02T08:02:56.3256183Z driver-version: 570.133.07 2025-07-02T08:02:56.3256412Z env: 2025-07-02T08:02:56.3256618Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:56.3256911Z REPOSITORY: pytorch/rl 2025-07-02T08:02:56.3257143Z PR_NUMBER: 3030 2025-07-02T08:02:56.3259257Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:56.3261545Z RUNNER_ARTIFACT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-07-02T08:02:56.3262088Z RUNNER_TEST_RESULTS_DIR: /home/ec2-user/actions-runner/_work/_temp/test-results 2025-07-02T08:02:56.3262587Z RUNNER_DOCS_DIR: /home/ec2-user/actions-runner/_work/_temp/docs 2025-07-02T08:02:56.3262934Z ##[endgroup] 2025-07-02T08:02:56.3305647Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-07-02T08:02:56.3306026Z with: 2025-07-02T08:02:56.3306206Z timeout_minutes: 10 2025-07-02T08:02:56.3306428Z max_attempts: 3 2025-07-02T08:02:56.3331284Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-docker2 \ nvidia-container-toolkit-1.16.2 \ libnvidia-container-tools-1.16.2 \ libnvidia-container1-1.16.2 \ nvidia-container-toolkit-base-1.16.2 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-docker2 nvidia-container-toolkit-1.16.2 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-07-02T08:02:56.3356665Z retry_wait_seconds: 10 2025-07-02T08:02:56.3356904Z polling_interval_seconds: 1 2025-07-02T08:02:56.3357160Z warning_on_retry: true 2025-07-02T08:02:56.3357385Z continue_on_error: false 2025-07-02T08:02:56.3357769Z env: 2025-07-02T08:02:56.3357964Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:02:56.3358253Z REPOSITORY: pytorch/rl 2025-07-02T08:02:56.3358472Z PR_NUMBER: 3030 2025-07-02T08:02:56.3360623Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:02:56.3362889Z RUNNER_ARTIFACT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-07-02T08:02:56.3363423Z RUNNER_TEST_RESULTS_DIR: /home/ec2-user/actions-runner/_work/_temp/test-results 2025-07-02T08:02:56.3363919Z RUNNER_DOCS_DIR: /home/ec2-user/actions-runner/_work/_temp/docs 2025-07-02T08:02:56.3364280Z DRIVER_VERSION: 570.133.07 2025-07-02T08:02:56.3364538Z ##[endgroup] 2025-07-02T08:02:56.4245652Z == Installing nvidia driver NVIDIA-Linux-x86_64-570.133.07.run == 2025-07-02T08:02:56.4246247Z + pre_install_nvidia_driver_amzn2 2025-07-02T08:02:56.4251860Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-07-02T08:02:56.7560851Z No match for argument: nvidia-driver-latest-dkms 2025-07-02T08:02:56.7561310Z No packages marked for removal. 2025-07-02T08:02:56.7637725Z Dependencies resolved. 2025-07-02T08:02:56.7648376Z Nothing to do. 2025-07-02T08:02:56.7648624Z Complete! 2025-07-02T08:02:56.8149875Z + install_nvidia_driver_common 2025-07-02T08:02:56.8153735Z + echo 'Before installing NVIDIA driver' 2025-07-02T08:02:56.8154022Z + lspci 2025-07-02T08:02:56.8155686Z Before installing NVIDIA driver 2025-07-02T08:02:56.8282020Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-07-02T08:02:56.8282535Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-07-02T08:02:56.8283084Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-07-02T08:02:56.8283590Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-07-02T08:02:56.8284046Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-07-02T08:02:56.8284558Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-07-02T08:02:56.8285311Z 00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1) 2025-07-02T08:02:56.8285774Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-07-02T08:02:56.8286162Z + lsmod 2025-07-02T08:02:56.8335852Z Module Size Used by 2025-07-02T08:02:56.8336451Z xt_conntrack 16384 1 2025-07-02T08:02:56.8336950Z nft_chain_nat 16384 3 2025-07-02T08:02:56.8337836Z xt_MASQUERADE 20480 1 2025-07-02T08:02:56.8338405Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-07-02T08:02:56.8339028Z nf_conntrack_netlink 57344 0 2025-07-02T08:02:56.8339784Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-07-02T08:02:56.8340610Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-07-02T08:02:56.8341199Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-07-02T08:02:56.8341738Z xfrm_user 57344 1 2025-07-02T08:02:56.8342234Z xfrm_algo 16384 1 xfrm_user 2025-07-02T08:02:56.8342772Z xt_addrtype 16384 2 2025-07-02T08:02:56.8343252Z nft_compat 20480 4 2025-07-02T08:02:56.8343821Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-07-02T08:02:56.8344366Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-07-02T08:02:56.8344730Z br_netfilter 36864 0 2025-07-02T08:02:56.8344990Z bridge 323584 1 br_netfilter 2025-07-02T08:02:56.8345452Z stp 16384 1 bridge 2025-07-02T08:02:56.8345720Z llc 16384 2 bridge,stp 2025-07-02T08:02:56.8345991Z overlay 167936 0 2025-07-02T08:02:56.8346229Z tls 139264 0 2025-07-02T08:02:56.8346462Z nls_ascii 16384 1 2025-07-02T08:02:56.8346700Z nls_cp437 20480 1 2025-07-02T08:02:56.8346930Z vfat 24576 1 2025-07-02T08:02:56.8347167Z fat 86016 1 vfat 2025-07-02T08:02:56.8347415Z sunrpc 700416 1 2025-07-02T08:02:56.8347647Z i8042 45056 0 2025-07-02T08:02:56.8347889Z serio 28672 3 i8042 2025-07-02T08:02:56.8348144Z ena 180224 0 2025-07-02T08:02:56.8348374Z button 24576 0 2025-07-02T08:02:56.8348617Z ghash_clmulni_intel 16384 0 2025-07-02T08:02:56.8348865Z sch_fq_codel 20480 17 2025-07-02T08:02:56.8349102Z fuse 184320 1 2025-07-02T08:02:56.8349345Z dm_mod 188416 0 2025-07-02T08:02:56.8349570Z loop 36864 0 2025-07-02T08:02:56.8349811Z configfs 57344 1 2025-07-02T08:02:56.8350039Z dmi_sysfs 20480 0 2025-07-02T08:02:56.8350288Z crc32_pclmul 16384 0 2025-07-02T08:02:56.8350526Z crc32c_intel 24576 0 2025-07-02T08:02:56.8350765Z efivarfs 24576 1 2025-07-02T08:02:56.8351004Z + modinfo nvidia 2025-07-02T08:02:56.8356065Z filename: /lib/modules/6.1.141-155.222.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-07-02T08:02:56.8356592Z import_ns: DMA_BUF 2025-07-02T08:02:56.8356833Z alias: char-major-195-* 2025-07-02T08:02:56.8357097Z version: 570.133.07 2025-07-02T08:02:56.8357327Z supported: external 2025-07-02T08:02:56.8357570Z license: Dual MIT/GPL 2025-07-02T08:02:56.8357846Z firmware: nvidia/570.133.07/gsp_tu10x.bin 2025-07-02T08:02:56.8358168Z firmware: nvidia/570.133.07/gsp_ga10x.bin 2025-07-02T08:02:56.8358486Z srcversion: 49515739FD8F721A3F2F714 2025-07-02T08:02:56.8358791Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-07-02T08:02:56.8359180Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-07-02T08:02:56.8359502Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-07-02T08:02:56.8359804Z depends: i2c-core,drm 2025-07-02T08:02:56.8360042Z retpoline: Y 2025-07-02T08:02:56.8360254Z name: nvidia 2025-07-02T08:02:56.8360604Z vermagic: 6.1.141-155.222.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-07-02T08:02:56.8361205Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-07-02T08:02:56.8361640Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-07-02T08:02:56.8362037Z parm: NVreg_ResmanDebugLevel:int 2025-07-02T08:02:56.8362334Z parm: NVreg_RmLogonRC:int 2025-07-02T08:02:56.8362616Z parm: NVreg_ModifyDeviceFiles:int 2025-07-02T08:02:56.8362917Z parm: NVreg_DeviceFileUID:int 2025-07-02T08:02:56.8363206Z parm: NVreg_DeviceFileGID:int 2025-07-02T08:02:56.8363500Z parm: NVreg_DeviceFileMode:int 2025-07-02T08:02:56.8363848Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-07-02T08:02:56.8364217Z parm: NVreg_UsePageAttributeTable:int 2025-07-02T08:02:56.8364541Z parm: NVreg_EnablePCIeGen3:int 2025-07-02T08:02:56.8364825Z parm: NVreg_EnableMSI:int 2025-07-02T08:02:56.8365118Z parm: NVreg_EnableStreamMemOPs:int 2025-07-02T08:02:56.8365465Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-07-02T08:02:56.8365857Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-07-02T08:02:56.8366219Z parm: NVreg_EnableS0ixPowerManagement:int 2025-07-02T08:02:56.8366623Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-07-02T08:02:56.8367016Z parm: NVreg_DynamicPowerManagement:int 2025-07-02T08:02:56.8367420Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-07-02T08:02:56.8367916Z parm: NVreg_EnableGpuFirmware:int 2025-07-02T08:02:56.8368235Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-07-02T08:02:56.8368593Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-07-02T08:02:56.8368949Z parm: NVreg_EnableUserNUMAManagement:int 2025-07-02T08:02:56.8369281Z parm: NVreg_MemoryPoolSize:int 2025-07-02T08:02:56.8369588Z parm: NVreg_KMallocHeapMaxSize:int 2025-07-02T08:02:56.8369908Z parm: NVreg_VMallocHeapMaxSize:int 2025-07-02T08:02:56.8370220Z parm: NVreg_IgnoreMMIOCheck:int 2025-07-02T08:02:56.8370519Z parm: NVreg_NvLinkDisable:int 2025-07-02T08:02:56.8370856Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-07-02T08:02:56.8371207Z parm: NVreg_RegisterPCIDriver:int 2025-07-02T08:02:56.8371531Z parm: NVreg_EnableResizableBar:int 2025-07-02T08:02:56.8371849Z parm: NVreg_EnableDbgBreakpoint:int 2025-07-02T08:02:56.8372197Z parm: NVreg_EnableNonblockingOpen:int 2025-07-02T08:02:56.8372518Z parm: NVreg_RegistryDwords:charp 2025-07-02T08:02:56.8372855Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-07-02T08:02:56.8373177Z parm: NVreg_RmMsg:charp 2025-07-02T08:02:56.8373451Z parm: NVreg_GpuBlacklist:charp 2025-07-02T08:02:56.8373771Z parm: NVreg_TemporaryFilePath:charp 2025-07-02T08:02:56.8374078Z parm: NVreg_ExcludedGpus:charp 2025-07-02T08:02:56.8374384Z parm: NVreg_DmaRemapPeerMmio:int 2025-07-02T08:02:56.8374699Z parm: NVreg_RmNvlinkBandwidth:charp 2025-07-02T08:02:56.8375049Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-07-02T08:02:56.8375384Z parm: NVreg_ImexChannelCount:int 2025-07-02T08:02:56.8375701Z parm: NVreg_CreateImexChannel0:int 2025-07-02T08:02:56.8376036Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-07-02T08:02:56.8376361Z parm: rm_firmware_active:charp 2025-07-02T08:02:56.8376655Z + HAS_NVIDIA_DRIVER=0 2025-07-02T08:02:56.8376880Z ++ command -v nvidia-smi 2025-07-02T08:02:56.8377132Z + '[' -x /usr/bin/nvidia-smi ']' 2025-07-02T08:02:56.8377370Z + set +e 2025-07-02T08:02:56.8377668Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-07-02T08:02:58.6674012Z + INSTALLED_DRIVER_VERSION=570.133.07 2025-07-02T08:02:58.6674381Z + NVIDIA_SMI_STATUS=0 2025-07-02T08:02:58.6674639Z + '[' 0 -ne 0 ']' 2025-07-02T08:02:58.6674874Z + '[' 570.133.07 '!=' 570.133.07 ']' 2025-07-02T08:02:58.6675134Z + HAS_NVIDIA_DRIVER=1 2025-07-02T08:02:58.6675952Z + echo 'NVIDIA driver (570.133.07) has already been installed. Skipping NVIDIA driver installation' 2025-07-02T08:02:58.6676424Z + set -e 2025-07-02T08:02:58.6676602Z + '[' 1 -eq 0 ']' 2025-07-02T08:02:58.6676984Z NVIDIA driver (570.133.07) has already been installed. Skipping NVIDIA driver installation 2025-07-02T08:02:58.6677440Z + post_install_nvidia_driver_common 2025-07-02T08:02:58.6680489Z + sudo modprobe nvidia 2025-07-02T08:02:58.7940859Z + echo 'After installing NVIDIA driver' 2025-07-02T08:02:58.7941173Z + lspci 2025-07-02T08:02:58.7941377Z After installing NVIDIA driver 2025-07-02T08:02:58.8068963Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-07-02T08:02:58.8069451Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-07-02T08:02:58.8070060Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-07-02T08:02:58.8070644Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-07-02T08:02:58.8071133Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-07-02T08:02:58.8080454Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-07-02T08:02:58.8080936Z 00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1) 2025-07-02T08:02:58.8081400Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-07-02T08:02:58.8082176Z + lsmod 2025-07-02T08:02:58.8108495Z Module Size Used by 2025-07-02T08:02:58.8108836Z nvidia_uvm 1884160 0 2025-07-02T08:02:58.8109176Z nvidia 11583488 1 nvidia_uvm 2025-07-02T08:02:58.8109448Z drm 602112 1 nvidia 2025-07-02T08:02:58.8109739Z drm_panel_orientation_quirks 32768 1 drm 2025-07-02T08:02:58.8110028Z backlight 24576 1 drm 2025-07-02T08:02:58.8110303Z i2c_core 110592 2 nvidia,drm 2025-07-02T08:02:58.8110574Z xt_conntrack 16384 1 2025-07-02T08:02:58.8110827Z nft_chain_nat 16384 3 2025-07-02T08:02:58.8111085Z xt_MASQUERADE 20480 1 2025-07-02T08:02:58.8111364Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-07-02T08:02:58.8111687Z nf_conntrack_netlink 57344 0 2025-07-02T08:02:58.8112063Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-07-02T08:02:58.8112487Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-07-02T08:02:58.8112786Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-07-02T08:02:58.8113068Z xfrm_user 57344 1 2025-07-02T08:02:58.8113314Z xfrm_algo 16384 1 xfrm_user 2025-07-02T08:02:58.8113591Z xt_addrtype 16384 2 2025-07-02T08:02:58.8113830Z nft_compat 20480 4 2025-07-02T08:02:58.8114112Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-07-02T08:02:58.8114517Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-07-02T08:02:58.8114876Z br_netfilter 36864 0 2025-07-02T08:02:58.8115182Z bridge 323584 1 br_netfilter 2025-07-02T08:02:58.8115475Z stp 16384 1 bridge 2025-07-02T08:02:58.8115748Z llc 16384 2 bridge,stp 2025-07-02T08:02:58.8116020Z overlay 167936 0 2025-07-02T08:02:58.8116258Z tls 139264 0 2025-07-02T08:02:58.8116490Z nls_ascii 16384 1 2025-07-02T08:02:58.8116719Z nls_cp437 20480 1 2025-07-02T08:02:58.8116956Z vfat 24576 1 2025-07-02T08:02:58.8117181Z fat 86016 1 vfat 2025-07-02T08:02:58.8117428Z sunrpc 700416 1 2025-07-02T08:02:58.8117656Z i8042 45056 0 2025-07-02T08:02:58.8117888Z serio 28672 3 i8042 2025-07-02T08:02:58.8118138Z ena 180224 0 2025-07-02T08:02:58.8118360Z button 24576 0 2025-07-02T08:02:58.8118594Z ghash_clmulni_intel 16384 0 2025-07-02T08:02:58.8118838Z sch_fq_codel 20480 17 2025-07-02T08:02:58.8119137Z fuse 184320 1 2025-07-02T08:02:58.8119541Z dm_mod 188416 0 2025-07-02T08:02:58.8119767Z loop 36864 0 2025-07-02T08:02:58.8120001Z configfs 57344 1 2025-07-02T08:02:58.8120232Z dmi_sysfs 20480 0 2025-07-02T08:02:58.8120468Z crc32_pclmul 16384 0 2025-07-02T08:02:58.8120696Z crc32c_intel 24576 0 2025-07-02T08:02:58.8120931Z efivarfs 24576 1 2025-07-02T08:02:58.8121168Z + modinfo nvidia 2025-07-02T08:02:58.8130454Z filename: /lib/modules/6.1.141-155.222.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-07-02T08:02:58.8130997Z import_ns: DMA_BUF 2025-07-02T08:02:58.8131266Z alias: char-major-195-* 2025-07-02T08:02:58.8131526Z version: 570.133.07 2025-07-02T08:02:58.8131756Z supported: external 2025-07-02T08:02:58.8131993Z license: Dual MIT/GPL 2025-07-02T08:02:58.8132268Z firmware: nvidia/570.133.07/gsp_tu10x.bin 2025-07-02T08:02:58.8132589Z firmware: nvidia/570.133.07/gsp_ga10x.bin 2025-07-02T08:02:58.8132914Z srcversion: 49515739FD8F721A3F2F714 2025-07-02T08:02:58.8133214Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-07-02T08:02:58.8133544Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-07-02T08:02:58.8133862Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-07-02T08:02:58.8134166Z depends: i2c-core,drm 2025-07-02T08:02:58.8134406Z retpoline: Y 2025-07-02T08:02:58.8134737Z name: nvidia 2025-07-02T08:02:58.8135084Z vermagic: 6.1.141-155.222.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-07-02T08:02:58.8135543Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-07-02T08:02:58.8135978Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-07-02T08:02:58.8136380Z parm: NVreg_ResmanDebugLevel:int 2025-07-02T08:02:58.8136684Z parm: NVreg_RmLogonRC:int 2025-07-02T08:02:58.8136971Z parm: NVreg_ModifyDeviceFiles:int 2025-07-02T08:02:58.8137501Z parm: NVreg_DeviceFileUID:int 2025-07-02T08:02:58.8137789Z parm: NVreg_DeviceFileGID:int 2025-07-02T08:02:58.8138084Z parm: NVreg_DeviceFileMode:int 2025-07-02T08:02:58.8138433Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-07-02T08:02:58.8138804Z parm: NVreg_UsePageAttributeTable:int 2025-07-02T08:02:58.8139125Z parm: NVreg_EnablePCIeGen3:int 2025-07-02T08:02:58.8139416Z parm: NVreg_EnableMSI:int 2025-07-02T08:02:58.8139709Z parm: NVreg_EnableStreamMemOPs:int 2025-07-02T08:02:58.8140050Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-07-02T08:02:58.8140434Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-07-02T08:02:58.8140800Z parm: NVreg_EnableS0ixPowerManagement:int 2025-07-02T08:02:58.8141205Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-07-02T08:02:58.8141596Z parm: NVreg_DynamicPowerManagement:int 2025-07-02T08:02:58.8142001Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-07-02T08:02:58.8142398Z parm: NVreg_EnableGpuFirmware:int 2025-07-02T08:02:58.8142719Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-07-02T08:02:58.8143077Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-07-02T08:02:58.8143432Z parm: NVreg_EnableUserNUMAManagement:int 2025-07-02T08:02:58.8143761Z parm: NVreg_MemoryPoolSize:int 2025-07-02T08:02:58.8144069Z parm: NVreg_KMallocHeapMaxSize:int 2025-07-02T08:02:58.8144391Z parm: NVreg_VMallocHeapMaxSize:int 2025-07-02T08:02:58.8144701Z parm: NVreg_IgnoreMMIOCheck:int 2025-07-02T08:02:58.8144998Z parm: NVreg_NvLinkDisable:int 2025-07-02T08:02:58.8145332Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-07-02T08:02:58.8145679Z parm: NVreg_RegisterPCIDriver:int 2025-07-02T08:02:58.8145994Z parm: NVreg_EnableResizableBar:int 2025-07-02T08:02:58.8146310Z parm: NVreg_EnableDbgBreakpoint:int 2025-07-02T08:02:58.8146776Z parm: NVreg_EnableNonblockingOpen:int 2025-07-02T08:02:58.8147101Z parm: NVreg_RegistryDwords:charp 2025-07-02T08:02:58.8147432Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-07-02T08:02:58.8147752Z parm: NVreg_RmMsg:charp 2025-07-02T08:02:58.8148023Z parm: NVreg_GpuBlacklist:charp 2025-07-02T08:02:58.8148341Z parm: NVreg_TemporaryFilePath:charp 2025-07-02T08:02:58.8148655Z parm: NVreg_ExcludedGpus:charp 2025-07-02T08:02:58.8148972Z parm: NVreg_DmaRemapPeerMmio:int 2025-07-02T08:02:58.8149281Z parm: NVreg_RmNvlinkBandwidth:charp 2025-07-02T08:02:58.8149625Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-07-02T08:02:58.8149965Z parm: NVreg_ImexChannelCount:int 2025-07-02T08:02:58.8150278Z parm: NVreg_CreateImexChannel0:int 2025-07-02T08:02:58.8150613Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-07-02T08:02:58.8150937Z parm: rm_firmware_active:charp 2025-07-02T08:02:58.8151211Z + set +e 2025-07-02T08:02:58.8151385Z + nvidia-smi 2025-07-02T08:03:00.2278883Z Wed Jul 2 08:03:00 2025 2025-07-02T08:03:00.2279722Z +-----------------------------------------------------------------------------------------+ 2025-07-02T08:03:00.2280696Z | NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 | 2025-07-02T08:03:00.2282130Z |-----------------------------------------+------------------------+----------------------+ 2025-07-02T08:03:00.2283084Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-07-02T08:03:00.2284108Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-07-02T08:03:00.2284950Z | | | MIG M. | 2025-07-02T08:03:00.2285426Z |=========================================+========================+======================| 2025-07-02T08:03:00.2343299Z | 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 | 2025-07-02T08:03:00.2343750Z | 0% 35C P0 68W / 300W | 0MiB / 23028MiB | 4% Default | 2025-07-02T08:03:00.2344129Z | | | N/A | 2025-07-02T08:03:00.2344512Z +-----------------------------------------+------------------------+----------------------+ 2025-07-02T08:03:00.2345795Z 2025-07-02T08:03:00.2346224Z +-----------------------------------------------------------------------------------------+ 2025-07-02T08:03:00.2346652Z | Processes: | 2025-07-02T08:03:00.2347090Z | GPU GI CI PID Type Process name GPU Memory | 2025-07-02T08:03:00.2347508Z | ID ID Usage | 2025-07-02T08:03:00.2347855Z |=========================================================================================| 2025-07-02T08:03:00.2348282Z | No running processes found | 2025-07-02T08:03:00.2348749Z +-----------------------------------------------------------------------------------------+ 2025-07-02T08:03:00.6538263Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-07-02T08:03:02.0680464Z NVIDIA A10G 2025-07-02T08:03:02.3378539Z + NVIDIA_SMI_STATUS=0 2025-07-02T08:03:02.3378784Z + '[' 0 -eq 0 ']' 2025-07-02T08:03:02.3379017Z + echo 'INFO: Ignoring allowed status 0' 2025-07-02T08:03:02.3379293Z + set -e 2025-07-02T08:03:02.3379487Z INFO: Ignoring allowed status 0 2025-07-02T08:03:02.3390768Z == Installing nvidia container toolkit for amzn2023 == 2025-07-02T08:03:02.3394317Z + sudo yum install -y yum-utils 2025-07-02T08:03:02.7559295Z Last metadata expiration check: 0:00:55 ago on Wed Jul 2 08:02:07 2025. 2025-07-02T08:03:02.7802244Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-07-02T08:03:02.8278388Z Dependencies resolved. 2025-07-02T08:03:02.8503535Z Nothing to do. 2025-07-02T08:03:02.8504379Z Complete! 2025-07-02T08:03:02.8907798Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-07-02T08:03:02.8908478Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-07-02T08:03:02.8909367Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-07-02T08:03:03.1897047Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-07-02T08:03:03.2354688Z + sudo yum install -y nvidia-docker2 nvidia-container-toolkit-1.16.2 libnvidia-container-tools-1.16.2 libnvidia-container1-1.16.2 nvidia-container-toolkit-base-1.16.2 2025-07-02T08:03:03.7566550Z nvidia-container-toolkit 19 kB/s | 833 B 00:00 2025-07-02T08:03:03.7815648Z Package nvidia-docker2-2.14.0-1.noarch is already installed. 2025-07-02T08:03:03.8310272Z Dependencies resolved. 2025-07-02T08:03:03.8537868Z ================================================================================ 2025-07-02T08:03:03.8538291Z Package Arch Version Repository Size 2025-07-02T08:03:03.8539030Z ================================================================================ 2025-07-02T08:03:03.8539325Z Downgrading: 2025-07-02T08:03:03.8539683Z libnvidia-container-tools x86_64 1.16.2-1 nvidia-container-toolkit 39 k 2025-07-02T08:03:03.8540233Z libnvidia-container1 x86_64 1.16.2-1 nvidia-container-toolkit 1.0 M 2025-07-02T08:03:03.8540778Z nvidia-container-toolkit x86_64 1.16.2-1 nvidia-container-toolkit 1.2 M 2025-07-02T08:03:03.8541402Z nvidia-container-toolkit-base x86_64 1.16.2-1 nvidia-container-toolkit 5.6 M 2025-07-02T08:03:03.8541937Z 2025-07-02T08:03:03.8542073Z Transaction Summary 2025-07-02T08:03:03.8542347Z ================================================================================ 2025-07-02T08:03:03.8542643Z Downgrade 4 Packages 2025-07-02T08:03:03.8542782Z 2025-07-02T08:03:03.8542916Z Total download size: 7.8 M 2025-07-02T08:03:03.8543668Z Downloading Packages: 2025-07-02T08:03:03.8715818Z (1/4): libnvidia-container-tools-1.16.2-1.x86_6 2.4 MB/s | 39 kB 00:00 2025-07-02T08:03:03.8895797Z (2/4): libnvidia-container1-1.16.2-1.x86_64.rpm 29 MB/s | 1.0 MB 00:00 2025-07-02T08:03:03.9054038Z (3/4): nvidia-container-toolkit-1.16.2-1.x86_64 25 MB/s | 1.2 MB 00:00 2025-07-02T08:03:03.9380125Z (4/4): nvidia-container-toolkit-base-1.16.2-1.x 85 MB/s | 5.6 MB 00:00 2025-07-02T08:03:03.9388787Z -------------------------------------------------------------------------------- 2025-07-02T08:03:03.9391777Z Total 93 MB/s | 7.8 MB 00:00 2025-07-02T08:03:03.9394339Z Running transaction check 2025-07-02T08:03:03.9507179Z Transaction check succeeded. 2025-07-02T08:03:03.9507572Z Running transaction test 2025-07-02T08:03:03.9919383Z Transaction test succeeded. 2025-07-02T08:03:03.9922139Z Running transaction 2025-07-02T08:03:04.5722752Z Preparing : 1/1 2025-07-02T08:03:04.6630825Z Downgrading : nvidia-container-toolkit-base-1.16.2-1.x86_64 1/8 2025-07-02T08:03:04.6674588Z Downgrading : libnvidia-container1-1.16.2-1.x86_64 2/8 2025-07-02T08:03:04.6967657Z Running scriptlet: libnvidia-container1-1.16.2-1.x86_64 2/8 2025-07-02T08:03:04.8038875Z Downgrading : libnvidia-container-tools-1.16.2-1.x86_64 3/8 2025-07-02T08:03:04.8081392Z Downgrading : nvidia-container-toolkit-1.16.2-1.x86_64 4/8 2025-07-02T08:03:04.8282865Z Running scriptlet: nvidia-container-toolkit-1.16.2-1.x86_64 4/8 2025-07-02T08:03:04.8283659Z Cleanup : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-07-02T08:03:04.8480058Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-07-02T08:03:04.8522697Z Cleanup : libnvidia-container-tools-1.17.8-1.x86_64 6/8 2025-07-02T08:03:04.8523218Z Cleanup : libnvidia-container1-1.17.8-1.x86_64 7/8 2025-07-02T08:03:04.8773587Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 7/8 2025-07-02T08:03:04.8806933Z Cleanup : nvidia-container-toolkit-base-1.17.8-1.x86_64 8/8 2025-07-02T08:03:04.9421170Z Running scriptlet: nvidia-container-toolkit-1.16.2-1.x86_64 8/8 2025-07-02T08:03:05.1216373Z Running scriptlet: nvidia-container-toolkit-base-1.17.8-1.x86_64 8/8 2025-07-02T08:03:05.1217183Z Verifying : libnvidia-container-tools-1.16.2-1.x86_64 1/8 2025-07-02T08:03:05.1217747Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 2/8 2025-07-02T08:03:05.1218261Z Verifying : libnvidia-container1-1.16.2-1.x86_64 3/8 2025-07-02T08:03:05.1218774Z Verifying : libnvidia-container1-1.17.8-1.x86_64 4/8 2025-07-02T08:03:05.1219279Z Verifying : nvidia-container-toolkit-1.16.2-1.x86_64 5/8 2025-07-02T08:03:05.1220194Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 6/8 2025-07-02T08:03:05.1220711Z Verifying : nvidia-container-toolkit-base-1.16.2-1.x86_64 7/8 2025-07-02T08:03:05.3106218Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 8/8 2025-07-02T08:03:05.3106637Z 2025-07-02T08:03:05.3106726Z Downgraded: 2025-07-02T08:03:05.3107073Z libnvidia-container-tools-1.16.2-1.x86_64 2025-07-02T08:03:05.3107609Z libnvidia-container1-1.16.2-1.x86_64 2025-07-02T08:03:05.3108159Z nvidia-container-toolkit-1.16.2-1.x86_64 2025-07-02T08:03:05.3108709Z nvidia-container-toolkit-base-1.16.2-1.x86_64 2025-07-02T08:03:05.3109053Z 2025-07-02T08:03:05.3109132Z Complete! 2025-07-02T08:03:05.3609159Z + sudo systemctl restart docker 2025-07-02T08:03:09.4017302Z Wed Jul 2 08:03:09 2025 2025-07-02T08:03:09.4018117Z +-----------------------------------------------------------------------------------------+ 2025-07-02T08:03:09.4018820Z | NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 | 2025-07-02T08:03:09.4019491Z |-----------------------------------------+------------------------+----------------------+ 2025-07-02T08:03:09.4020164Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-07-02T08:03:09.4020876Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-07-02T08:03:09.4021476Z | | | MIG M. | 2025-07-02T08:03:09.4021914Z |=========================================+========================+======================| 2025-07-02T08:03:09.4102288Z | 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 | 2025-07-02T08:03:09.4102878Z | 0% 35C P0 68W / 300W | 0MiB / 23028MiB | 4% Default | 2025-07-02T08:03:09.4103419Z | | | N/A | 2025-07-02T08:03:09.4103958Z +-----------------------------------------+------------------------+----------------------+ 2025-07-02T08:03:09.4104485Z 2025-07-02T08:03:09.4105016Z +-----------------------------------------------------------------------------------------+ 2025-07-02T08:03:09.4105945Z | Processes: | 2025-07-02T08:03:09.4106551Z | GPU GI CI PID Type Process name GPU Memory | 2025-07-02T08:03:09.4107108Z | ID ID Usage | 2025-07-02T08:03:09.4107577Z |=========================================================================================| 2025-07-02T08:03:09.4108151Z | No running processes found | 2025-07-02T08:03:09.4108779Z +-----------------------------------------------------------------------------------------+ 2025-07-02T08:03:09.5767786Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-07-02T08:03:09.8810959Z 3.13: Pulling from docker/library/python 2025-07-02T08:03:10.0097775Z c19952135643: Pulling fs layer 2025-07-02T08:03:10.0098074Z 7bbf972c6c2f: Pulling fs layer 2025-07-02T08:03:10.0098336Z 900e2c02f17f: Pulling fs layer 2025-07-02T08:03:10.0098669Z abe9c1abe6f3: Pulling fs layer 2025-07-02T08:03:10.0099055Z 562e9f67c041: Pulling fs layer 2025-07-02T08:03:10.0099407Z 8ae8ebad5c0e: Pulling fs layer 2025-07-02T08:03:10.0099766Z 5b1a73f6734a: Pulling fs layer 2025-07-02T08:03:10.0100103Z 562e9f67c041: Waiting 2025-07-02T08:03:10.0100412Z 8ae8ebad5c0e: Waiting 2025-07-02T08:03:10.0100717Z 5b1a73f6734a: Waiting 2025-07-02T08:03:10.0101017Z abe9c1abe6f3: Waiting 2025-07-02T08:03:10.1331809Z 7bbf972c6c2f: Verifying Checksum 2025-07-02T08:03:10.1332198Z 7bbf972c6c2f: Download complete 2025-07-02T08:03:10.2171017Z c19952135643: Verifying Checksum 2025-07-02T08:03:10.2171775Z c19952135643: Download complete 2025-07-02T08:03:10.2229370Z 900e2c02f17f: Verifying Checksum 2025-07-02T08:03:10.2229747Z 900e2c02f17f: Download complete 2025-07-02T08:03:10.3201181Z 562e9f67c041: Verifying Checksum 2025-07-02T08:03:10.3201513Z 562e9f67c041: Download complete 2025-07-02T08:03:10.3532038Z 8ae8ebad5c0e: Verifying Checksum 2025-07-02T08:03:10.3532344Z 8ae8ebad5c0e: Download complete 2025-07-02T08:03:10.3791714Z 5b1a73f6734a: Verifying Checksum 2025-07-02T08:03:10.3792123Z 5b1a73f6734a: Download complete 2025-07-02T08:03:10.8034111Z abe9c1abe6f3: Verifying Checksum 2025-07-02T08:03:10.8034457Z abe9c1abe6f3: Download complete 2025-07-02T08:03:12.1798513Z c19952135643: Pull complete 2025-07-02T08:03:12.7572749Z 7bbf972c6c2f: Pull complete 2025-07-02T08:03:15.0599084Z 900e2c02f17f: Pull complete 2025-07-02T08:03:20.4942976Z abe9c1abe6f3: Pull complete 2025-07-02T08:03:20.7734905Z 562e9f67c041: Pull complete 2025-07-02T08:03:21.5138867Z 8ae8ebad5c0e: Pull complete 2025-07-02T08:03:21.5368171Z 5b1a73f6734a: Pull complete 2025-07-02T08:03:21.5504591Z Digest: sha256:0aafd87e2438b9db15ffc16e86eed18224c5bc10ab71671f379cae240f3c044e 2025-07-02T08:03:21.5546158Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-07-02T08:03:22.4200900Z ##[error]The operation was canceled. 2025-07-02T08:03:22.4335128Z ##[group]Run pmeier/pytest-results-action@a2c1430e2bddadbad9f49a6f9b879f062c6b19b1 2025-07-02T08:03:22.4335591Z with: 2025-07-02T08:03:22.4335868Z path: /home/ec2-user/actions-runner/_work/_temp/test-results 2025-07-02T08:03:22.4336239Z fail-on-empty: false 2025-07-02T08:03:22.4336454Z env: 2025-07-02T08:03:22.4336665Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:03:22.4336952Z REPOSITORY: pytorch/rl 2025-07-02T08:03:22.4337478Z PR_NUMBER: 3030 2025-07-02T08:03:22.4339642Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:03:22.4341971Z RUNNER_ARTIFACT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-07-02T08:03:22.4342509Z RUNNER_TEST_RESULTS_DIR: /home/ec2-user/actions-runner/_work/_temp/test-results 2025-07-02T08:03:22.4343010Z RUNNER_DOCS_DIR: /home/ec2-user/actions-runner/_work/_temp/docs 2025-07-02T08:03:22.4343423Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-07-02T08:03:22.4343761Z ##[endgroup] 2025-07-02T08:03:22.5166091Z ##[group]Run # Only do these steps if we actually want to upload an artifact 2025-07-02T08:03:22.5166649Z # Only do these steps if we actually want to upload an artifact 2025-07-02T08:03:22.5167065Z if [[ -n "${UPLOAD_ARTIFACT_NAME}" ]]; then 2025-07-02T08:03:22.5167570Z  # If the default execution path is followed then we should get a wheel in the dist/ folder 2025-07-02T08:03:22.5168118Z  # attempt to just grab whatever is in there and scoop it all up 2025-07-02T08:03:22.5168565Z  if find "dist/" -name "*.whl" >/dev/null 2>/dev/null; then 2025-07-02T08:03:22.5168957Z  mv -v dist/*.whl "${RUNNER_ARTIFACT_DIR}/" 2025-07-02T08:03:22.5169254Z  fi 2025-07-02T08:03:22.5169505Z  if [[ -d "artifacts-to-be-uploaded" ]]; then 2025-07-02T08:03:22.5169908Z  mv -v artifacts-to-be-uploaded/* "${RUNNER_ARTIFACT_DIR}/" 2025-07-02T08:03:22.5170440Z  fi 2025-07-02T08:03:22.5170657Z fi 2025-07-02T08:03:22.5170836Z  2025-07-02T08:03:22.5171024Z upload_docs=0 2025-07-02T08:03:22.5171379Z # Check if there are files in the documentation folder to upload, note that 2025-07-02T08:03:22.5171798Z # empty folders do not count 2025-07-02T08:03:22.5172254Z if find "${RUNNER_DOCS_DIR}" -mindepth 1 -maxdepth 1 -type f | read -r; then 2025-07-02T08:03:22.5172789Z  # TODO: Add a check here to test if on ec2 because if we're not on ec2 then this 2025-07-02T08:03:22.5173236Z  # upload will probably not work correctly 2025-07-02T08:03:22.5173539Z  upload_docs=1 2025-07-02T08:03:22.5173763Z fi 2025-07-02T08:03:22.5174036Z echo "upload-docs=${upload_docs}" >> "${GITHUB_OUTPUT}" 2025-07-02T08:03:22.5188353Z shell: /usr/bin/bash -e {0} 2025-07-02T08:03:22.5188590Z env: 2025-07-02T08:03:22.5188799Z DOCKER_IMAGE: nvidia/cudagl:11.4.0-base 2025-07-02T08:03:22.5189099Z REPOSITORY: pytorch/rl 2025-07-02T08:03:22.5189345Z PR_NUMBER: 3030 2025-07-02T08:03:22.5191500Z SCRIPT: if [[ "refs/pull/3030/merge" =~ release/* ]]; then export RELEASE=1 export TORCH_VERSION=stable else export RELEASE=0 export TORCH_VERSION=nightly fi set -euo pipefail export PYTHON_VERSION="3.9" export CU_VERSION="cu117" export TAR_OPTIONS="--no-same-owner" export UPLOAD_CHANNEL="nightly" export TF_CPP_MIN_LOG_LEVEL=0 export TD_GET_DEFAULTS_TO_NONE=1 bash .github/unittest/linux_libs/scripts_llm/setup_env.sh bash .github/unittest/linux_libs/scripts_llm/install.sh bash .github/unittest/linux_libs/scripts_llm/run_test.sh bash .github/unittest/linux_libs/scripts_llm/post_process.sh 2025-07-02T08:03:22.5193788Z RUNNER_ARTIFACT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-07-02T08:03:22.5194317Z RUNNER_TEST_RESULTS_DIR: /home/ec2-user/actions-runner/_work/_temp/test-results 2025-07-02T08:03:22.5194814Z RUNNER_DOCS_DIR: /home/ec2-user/actions-runner/_work/_temp/docs 2025-07-02T08:03:22.5195229Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-07-02T08:03:22.5195561Z UPLOAD_ARTIFACT_NAME: 2025-07-02T08:03:22.5195799Z ##[endgroup] 2025-07-02T08:03:22.5234255Z ##[error]An error occurred trying to start process '/usr/bin/bash' with working directory '/home/ec2-user/actions-runner/_work/rl/rl/pytorch/rl'. No such file or directory 2025-07-02T08:03:22.5415029Z Post job cleanup. 2025-07-02T08:03:22.6406429Z [command]/usr/bin/git version 2025-07-02T08:03:22.6453371Z git version 2.47.1 2025-07-02T08:03:22.6495406Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/39fb4638-3614-432e-8b64-67b34bae6f7a' before making global git config changes 2025-07-02T08:03:22.6496779Z Adding repository directory to the temporary git global config as a safe directory 2025-07-02T08:03:22.6509560Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/rl/rl/test-infra 2025-07-02T08:03:22.6548862Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-07-02T08:03:22.6588272Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-07-02T08:03:22.6995026Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-07-02T08:03:22.7022583Z http.https://github.com/.extraheader 2025-07-02T08:03:22.7035323Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-07-02T08:03:22.7070127Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-07-02T08:03:22.7530806Z A job completed hook has been configured by the self-hosted runner administrator 2025-07-02T08:03:22.7626965Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-07-02T08:03:22.7635516Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-07-02T08:03:22.7635867Z ##[endgroup] 2025-07-02T08:03:22.7753536Z [!ALERT!] Swap in detected! [!ALERT!] 2025-07-02T08:03:34.0408962Z [!ALERT!] Swap out detected [!ALERT!] 2025-07-02T08:03:51.8107118Z Cleaning up orphan processes